A Study on External Memory Scan-Based Skyline Algorithms
نویسندگان
چکیده
Skyline queries return the set of non-dominated tuples, where a tuple is dominated if there exists another with better values on all attributes. In the past few years the problem has been studied extensively, and a great number of external memory algorithms have been proposed. We thoroughly study the most important scan-based methods, which perform a number of passes over the database in order to extract the skyline. Although these algorithms are specifically designed to operate in external memory, there are many implementation details which are neglected, as well as several design choices resulting in different flavors for these basic methods. We perform an extensive experimental evaluation using real and synthetic data. We conclude that specific design choices can have a significant impact on performance. We also demonstrate that, contrary to common belief, simpler skyline algorithm can be much faster than methods based on pre-processing.
منابع مشابه
Output-sensitive Skyline Algorithms in External Memory
This paper presents new results in external memory for finding the skyline (a.k.a. maxima) of N points in d-dimensional space. The state of the art uses O((N/B) log M/B(N/B)) I/Os for fixed d ≥ 3, and O((N/B) logM/B(N/B)) I/Os for d = 2, where M and B are the sizes (in words) of memory and a disk block, respectively. We give algorithms whose running time depends on the number K of points in the...
متن کاملProgressive skylining over Web-accessible databases
Skyline queries return a set of interesting data points that are not dominated on all dimensions by any other point. Most of the existing algorithms focus on skyline computation in centralized databases, and some of them can progressively return skyline points upon identification rather than all in a batch. Processing skyline queries over the Web is a more challenging task because in many Web a...
متن کاملSkySuite: A Framework of Skyline-Join Operators for Static and Stream Environments
Efficient processing of skyline queries has been an area of growing interest over both static and stream environments. Most existing static and streaming techniques assume that the skyline query is applied to a single data source. Unfortunately, this is not true in many applications in which, due to the complexity of the schema, the skyline query may involve attributes belonging to multiple dat...
متن کاملMaximal Vector Computation in Large Data Sets
Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull— and hence, linear optimization—and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved. The initial algorithms proposed for maxi...
متن کاملApproaching the Skyline in Z Order
Given a set of multidimensional data points, skyline query retrieves a set of data points that are not dominated by any other points. This query is useful for multi-preference analysis and decision making. By analyzing the skyline query, we observe a close connection between Z-order curve and skyline processing strategies and propose to use a new index structure called ZBtree, to index and stor...
متن کامل